Method and Apparatus For Improved Coding Mode Selection

ABSTRACT

In this disclosure, a novel method for direct mode enhancement in B-pictures and skip mode enhancement in P-pictures in the framework of H.264 (MPEG-4/Part 10) is disclosed. Direct mode and skip mode enhancements are achieved by clustering the values of the Lagrangian, removing outliers and specifying smaller values of the Lagrangian multiplier in the rate-distortion optimization for encoding mode selection. Experimental results using high quality video sequences show that bit rate reduction is obtained using the method of the present invention, at the expense of a slight loss in peak signal-to-noise ratio (PSNR). By conducting two different experiments, it has been verified that no subjective visual loss is visible despite the peak signal-to-noise ratio change. In relationship to the existing rate-distortion optimization methods currently employed in the (non-normative) MPEG-4/Part 10 encoder, the method of the present invention represents a simple and useful add-on. More importantly, when other solutions such as further increasing the values of the quantization parameter are not applicable, as inadmissible artifacts would be introduced in the decoded pictures, the method of the present invention achieves bit rate reduction without introducing visible distortion in the decoded sequences. Even more, despite the fact that the present document makes use of the H.264 framework, the proposed method is applicable in any video encoding system that employs rate-distortion optimization for encoding mode selection.

RELATED APPLICATIONS

The present patent application claims the benefit of the previous U.S.Provisional Patent Application entitled “Method and Apparatus forImproved Coding Mode Selection” having Ser. No. 60/439,062 that wasfiled on Jan. 8, 2003.

FIELD OF THE INVENTION

The present invention relates to the field of multi-media compressionsystems. In particular, the present invention discloses methods andsystems for improving the encoding mode selection.

BACKGROUND OF THE INVENTION

Digital based electronic media formats are finally on the cusp oflargely replacing analog electronic media formats. Digital compact discs(CDs) replaced analog vinyl records long ago. Analog magnetic cassettetapes are becoming increasingly rare. Second and third generationdigital audio systems such as Mini-discs and MP3 (MPEG Audio-layer 3)are now taking market share from the first generation digital audioformat of compact discs.

The video media has been slower to move to digital storage andtransmission formats than audio. This has been largely due to themassive amounts of digital information required to accurately representvideo in digital form. The massive amounts of digital information neededto accurately represent video require very high-capacity digital storagesystems and high-bandwidth transmission systems.

However, video is now rapidly moving to digital storage and transmissionformats. Faster computer processors, high-density storage systems, andnew efficient compression and encoding algorithms have finally madedigital video practical at consumer price points. The DVD (DigitalVersatile Disc), a digital video system, has been one of the fastestselling consumer electronic products in years. DVDs have been rapidlysupplanting Video-Cassette Recorders (VCRs) as the pre-recorded videoplayback system of choice due their high video quality, very high audioquality, convenience, and extra features. The antiquated analog NTSC(National Television Standards Committee) video transmission system isnow being replaced with the digital ATSC (Advanced Television StandardsCommittee) video transmission system.

Computer systems have been using various different digital videoencoding formats for a number of years. Among the best digital videocompression and encoding systems used by computer systems have been thedigital video systems backed by the Motion Pictures Expert Groupcommonly known by the acronym MPEG. The three most well known and highlyused digital video formats from MPEG are known simply as MPEG-1, MPEG-2,and MPEG-4. Video CDs and consumer-grade digital video editing systemsuse the early MPEG-1 format. Digital Versatile Discs (DVDs) and the DishNetwork brand Direct Broadcast Satellite (DBS) television broadcastsystem use the MPEG-2 digital video compression and encoding system. TheMPEG-4 encoding system is rapidly being adapted by the latest computerbased digital video encoders and associated digital video players.

SUMMARY OF THE INVENTION

Methods and systems for improving the encoding mode selection areDisclosed. In this disclosure, a novel method for direct modeenhancement in B-pictures and skip mode enhancement in P-pictures in theframework of H.264 (MPEG-4/Part 10) is disclosed.

Direct mode and skip mode enhancements are achieved by making a numberof changes to the existing compression systems. Specifically, the systemof the present invention introduces the steps of removing outliers inthe distortion values, specifying smaller values for the Lagrangianmultiplier in the rate-distortion optimization for encoding modeselection, and clustering the values of the Lagrangian before encodingmode selection. In one embodiment, the Huber cost function is used tocompute the distortion for the different encoding modes in order toremove outliers. In one embodiment of the present invention, the systemchanges the Lagrangian multiplier to vary slower as a function of theQuantizer value Q than the reference H.264 (MPEG-4/Part 10)implementation. The Lagrangian clustering is used to favor mode 0encoding mode for bit rate reduction.

Experimental results using high quality video sequences show that bitrate reduction is obtained using the method of the present invention, atthe expense of a slight loss in peak signal-to-noise ratio (PSNR). Byconducting two different experiments, it has been verified that nosubjective visual loss is visible despite the peak signal-to-noise ratiochange.

In relationship to the existing rate-distortion optimization methodscurrently employed in the (non-normative) MPEG-4/Part 10 encoder, themethod of the present invention represents a simple and useful add-on.More importantly, when other solutions such as further increasing thevalues of the quantization parameter are not applicable, as inadmissibleartifacts would be introduced in the decoded pictures, the method of thepresent invention achieves bit rate reduction without introducingvisible distortion in the decoded sequences.

Other objects, features, and advantages of present invention will beapparent from the company drawings and from the following detaileddescription.

BRIEF DESCRIPTION OF THE DRAWINGS

The objects, features, and advantages of the present invention will beapparent to one skilled in the art, in view of the following detaileddescription in which:

FIG. 1 graphically illustrates the Huber cost function of a variable r.

FIG. 2A illustrates a variation of original and modified Lagrangianmultiplier λ_(mode) as a function of the quantization parameter (Q)values in the range of interest.

FIG. 2B illustrates a variation of original and modified Lagrangianmultiplier λ_(mode) for B-frames as a function of the quantizationparameter (Q) values in the range of interest.

FIG. 2C illustrates a variation of original and modified Lagrangianmultiplier λ_(motion) as a function of the quantization parameter (Q)values in the range of interest.

FIG. 3 illustrates a flow diagram that sets forth how an encoding modemay be selected.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

A method and system for improving the encoding mode selection isdisclosed. In the following description, for purposes of explanation,specific nomenclature is set forth to provide a thorough understandingof the present invention. However, it will be apparent to one skilled inthe art that these specific details are not required in order topractice the present invention.

Introduction

The emerging H.264 video encoding standard, also known as MPEG-4/Part10, Joint Video Team (JVT), Advanced Video Coding (AVC), and H.26L, hasbeen developed jointly by the Motion Picture Experts Group (MPEG) andthe International Telecommunication Union (ITU) with the goal to providehigher compression of moving pictures than state-of-art video encodingsystems that are compliant with existing MPEG standards. Targetapplications of H.264, which is expected to become an internationalstandard in 2003, include (but are not limited to) video conferencing,digital storage media, television broadcasting, internet streaming andcommunication.

Similar to other video encoding standards (in their main body orannexes), the H.264 standard employs a rate-distortion (RD) decisionframework. In particular, the H.264 standard employs rate-distortionoptimization for encoding mode selection and motion estimation. In thisdisclosure, the primary focus is on encoding mode selection within theframework of the H.264 standard.

In most digital video encoding systems, each video frame of a videosequence is divided into subsets of pixels, where the subsets of pixelsare called pixelblocks. In the H.264 standard, the pixelblocks may havevarious sizes (The pixelblock with a size equal to 16×16 pixels istraditionally known as a macroblock.). The encoding mode selectionproblem may be informally defined as “select the best of all possibleencoding methods (or encoding modes) to encode each pixelblock in thevideo frame.” The encoding mode selection problem may be solved by thevideo encoder in a number of different manners. One possible method ofsolving the encoding mode selection problem is to employ rate-distortionoptimization.

There are numerous different encoding modes that may be selected toencode each pixelblock within the framework of the H.264 video encodingstandard. Mode 0 is known as ‘direct mode’ in B-frames and as ‘skipmode’ in P-frames. Other encoding modes employ pixelblocks of sizesequal to 16×16, 16×8 and 8×16 pixels, 8×8, 8×4, 4×8, 4×4 pixels inB-pictures or P-pictures.

In direct mode (mode 0 in B-pictures), no motion information istransmitted to the decoder. Instead, a predictive system is used togenerate motion information. Therefore, the direct mode can provideimportant bit rate savings for sequences that allow good motion vectorpredictions using neighboring spatial or temporal information. However,the experimental evaluations have shown that the direct mode selectionin H.264 does not yield as many selected pixelblocks as expected forsome video sequences.

This disclosure proposes a method for enhancing the direct mode (mode 0)selection in Bidirectional predicted pictures (known as B-pictures orB-frames) within the framework of the H.264 standard. When applied toP-frames, the encoding method of the present invention achievesenhancement of the skip mode (also mode 0) selection. Direct mode andskip mode enhancements are achieved by clustering the Lagrangian values,removing outliers and specifying smaller values of the Lagrangianmultiplier in the rate-distortion optimization for the encoding modeselection.

Experimental results using high quality sample video sequencesillustrate that the bit rate of the compressed bitstreams from thepresent invention are reduced as compared to compressed bitstreamsobtained using the reference H.264 codec. This bit rate reduction isassociated with a slight loss in the peak signal-to-noise ratio (PSNR)of the bitstream. However, two test experiments verify that nosubjective visual loss is associated with the change in the peaksignal-to-noise ratio. More importantly, when other possible solutionssuch as further increasing the values of the quantization parameter arenot applicable since unacceptable artifacts would be introduced in thedecoded pictures the method of the present invention significantlyachieves further bit rate reduction without introducing visibledistortion in the decoded video sequences. Furthermore, despite the factthat the present invention makes use of the H.264 framework, theencoding method of the present invention is applicable in any videoencoding system that employs rate-distortion optimization.

The remainder of this document is organized as follows. A videocompression overview section first presents basic ideas related to therate-distortion optimization framework within the H.264 standard. Theencoding method proposed by the present invention is then set forth indetail in the proposed direct mode enhancement method section. Finally,a set of experimental results and conclusions are provided in theexperimental results section and the conclusions section, respectively.

Video Compression Overview

As set forth earlier in this document, each video frame is divided intosets of pixelblocks in the H.264 standard. These pixelblocks may beencoded using motion compensated predictive encoding. A predictedpixelblock may be an Infra (1) pixelblock (an I-pixelblock) that uses noinformation from preceding pictures in its encoding, a unidirectionallyPredicted (P) pixelblock (a P-pixelblock) that uses information from onepreceding picture, or Bidirectionally Predicted (B) pixelblock (aB-pixelblock) that uses information from one preceding picture and onefuture picture.

For each P-pixelblock in a P-picture, one motion vector is computed.(Note that, within each video picture the pixelblocks may be encoded inmany ways. For example, a pixelblock may be divided into smaller subblocks, with motion vectors computed and transmitted for each subblock.The shape of the subblocks may vary and not be square.) Using thecomputer motion vector, a prediction pixelblock can be formed by atranslation of pixels in the aforementioned previous picture. Thedifference between the actual pixelblock in the video picture and theprediction pixelblock is then encoded for transmission. (The differenceis used to correct minor differences between the predicted pixelblockand the actual pixelblock.)

Each motion vector may also be transmitted via predictive encoding. Thatis, a prediction for a motion vector is formed using nearby motionvectors that have already been transmitted, and then the differencebetween the actual motion vector and the predicted motion vector isencoded for transmission.

For each B-pixelblock, two motion vectors are typically computed, onemotion vector for the aforementioned previous picture and one for motionvector the future picture. (Note that within a P-picture or B-picture,some pixelblocks may be better encoded without using motioncompensation. Such pixels may be encoded as Infra-pixelblocks. Within aB-picture, some pixelblocks may be better encoded using forward orbackward unidirectional motion compensation. Such pixels may be encodedas forward predicted or backward predicted depending on whether aprevious picture or a future picture is used in the prediction.) Fromthe two B-pixelblock motion vectors, two prediction pixelblocks arecomputed. The two prediction pixelblocks are then combined together toform a final prediction pixelblock. As above, the difference between theactual pixelblock in the video picture and the prediction block is thenencoded for transmission.

As with P-pixelblocks, each motion vector of a B-pixelblock may betransmitted via predictive encoding. That is, a prediction motion vectormay be formed using nearby motion vectors that have already beentransmitted. Then the difference between the actual motion vector andthe prediction motion vector is then encoded for transmission.

However, with B-pixelblocks the opportunity also exists forinterpolating the motion vectors from those in the collocated or nearbypixelblocks of the stored pictures. (When the motion vector predictionis constructed using motion vectors of the collocated blocks of thecurrent pixelblock, the direct mode type is known as the temporal directmode. When the motion vector prediction is constructed using spatialneighbors of the current pixelblock, the direct mode type is known asthe spatial direct mode.) The interpolated value may then be used as aprediction motion vector and the difference between the actual motionvector and the prediction motion vector encoded for transmission. Suchinterpolation is carried out both in the encoder and decoder. (Note thatan encoder always has a decoder so the encoder will know exactly how areconstructed video picture will appear.)

In some cases, the interpolated motion vector is good enough to be usedwithout any correction difference, in which case no motion vector dataneeds be transmitted at all. This is referred to as Direct Mode in the11.264 (and 11.263) standard. Direct mode selection is particularlyeffective when recording camera is slowly panning across a stationarybackground. In fact, the motion vector interpolation may be good enoughto be used as is, which means that no differential information need betransmitted for these B-pixelblock motion vectors. In skip mode (mode 0in P-pictures), the motion vector prediction is constructed identicallyas in the 16×16 direct mode such that no transmission of motion vectorbits is carried out.

Prior to transmission, the prediction error (the difference) of apixelblock or subblock is typically transformed, quantized and entropyencoded to reduce the number of bits. The prediction error, which iscomputed as the mean square error between the original desiredpixelblock and the decoded prediction pixelblock after encoding usingdirect mode, is encoded in direct mode. However, the prediction error isnot encoded and transmitted in skip mode. The subblock size and shapeused for the transform may not be the same as the subblock size andshape used for motion compensation. For example, 8×8 pixels or 4×4pixels are commonly used for transforms, whereas 16×16 pixels, 16×8pixels, 8×16 pixels or smaller sizes are commonly used for motioncompensation. The motion compensation and transform subblock sizes andshapes may vary from pixelblock to pixelblock.

The selection of the best encoding mode to encode each pixelblock is oneof the decisions in the H.264 standard that has a very direct impact onthe bit rate R of the compressed bitstream, as well as on the distortionD in the decoded video sequence. The goal of encoding mode selection isto select the encoding mode M* that minimizes the distortion D( p)subject to a bit rate constraint of R( p)≦R_(max), where p is the vectorof adjustable encoding parameters and R_(max) is the maximum allowed bitrate. This constrained optimization problem may be transformed into anunconstrained optimization problem using the Lagrangian equation J( p,λ)given by:

J( p ,λ)=D( p )+λR( p )   (1)

where λ is the Lagrangian multiplier which controls the rate-distortiontradeoff. The encoding mode decision problem becomes the minimization ofJ( p,λ). This may be expressed in the following equation:

$\begin{matrix}{\min\limits_{{all}\mspace{14mu} \overset{\_}{p}}\left\{ {{D\left( \overset{\_}{p} \right)} + {\lambda \; {R\left( \overset{\_}{p} \right)}}} \right\}} & (2)\end{matrix}$

The preceding Lagrangian equation may be evaluated by performing thefollowing steps for each admissible encoding mode:

-   -   (a) Compute the distortion D as the L₂ norm of the error between        the original pixelblock and the reconstructed pixelblock after        encoding and decoding using a specific encoding mode;    -   (b) Compute the bit rate R as the total number of bits that are        necessary to encode the motion vectors and the transform        coefficients;    -   (c) Compute the Lagrangian J using equation (1);        Finally, the minimum Lagrangian J obtained after computing the        Lagrangian J values for all encoding modes indicates the        encoding mode M* that solves the minimization expressed by        equation (2).

Note that, in the H.264 video compression standard, the encoding modedecision is performed using 8×8 and smaller pixelblocks prior to theencoding mode decision for the larger pixelblocks. Furthermore, notethat in an effort to reduce the complexity of the optimization process,the minimization determination is carried out with a fixed Quantizervalue Q, and the Lagrange multiplier is often selected to be equal to(for instance) 0.85×Q/2 or 0.85×2^(Q/3), where Q is the quantizationparameter. For multiple B-pictures, much larger values are often chosen.Of course, this complexity reduction also restricts the search for theminimum value of the Lagrangian J in the rate-distortion plane.

Proposed Direct Mode Enhancement Method

The system of the present invention proposes a method for enhancement ofdirect mode selection in B-frames and skip mode selection in P-frames.The system of the present invention employs a clustering of cost values,outlier reduction, and specification of the Lagrange multiplier. In oneembodiment, the system performs the method using four steps. Thefollowing text provides a detailed recitation of these method steps withreference to FIG. 3.

First, the current pixelblock is both encoded and decoded for eachpossible encoding mode M and the distortion D_(M) is computed as setforth in steps 310 and 320. In one embodiment, the distortion D_(M) iscomputed as the sum of the Huber function values of the errors betweenthe pixels in the original pixelblock and the pixels in the decodedpixelblock. The Huber function, which is illustrated in FIG. 1, is givenby the following equation:

${D_{M}(x)} = \left\{ \begin{matrix}{{\frac{1}{2}x^{2}},} & {{x} \leq \beta} \\{{{\beta {x}} - {\frac{1}{2}\beta^{2}}},} & {{x} > \beta}\end{matrix} \right.$

where x is the error for one pixel of the pixelblock and β is aparameter. Clearly, for error values that are smaller than β, the valueof the Huber function is equal to that given by the square error. Forerror values that are larger than β, the value of the Huber function issmaller than that of the square error for the same error value.

Second, the bit rate R for each encoding mode is computed as set forthin step 330. In one embodiment, the system computes the bit rate R asthe total number of bits that are necessary to encode the motion vectorsand transform coefficients of the pixelblock.

Third, the system of the present invention computes the Lagrangian forthe encoding mode using equation (1) as set forth in step 340. In oneembodiment, the system selects the value of the Lagrangian multiplier λto be slower varying as a function of the quantization parameter thanthe original Lagrangian λ proposed in the non-normative part of theH.264 standard version 4.1. The proposed variation of Lagrangian λ as afunction of the Quantizer Q is illustrated in FIGS. 2A, 2B, and 2C. Bymaking the Lagrangian multiplier lambda vary slower than the lambda inthe reference implementation, the system of the present invention placesless emphasis on the bit rate component R of the Lagrangian equation (1)and thus more emphasis on the distortion component D. As a result ofthis change to the Lagrangian multiplier lambda, slight increases in thebit rate R will have less effect on the output Lagrangian value of J.(This will also reduce the effect of bit rate R has on the Lagrangiancluster set forth in the following paragraph.)

Fourth, let J_(M)* be the minimum value of J for all J_(M) (usingequation (1)), where M is one of the possible encoding modes. Instead ofselecting the encoding mode (M*) as that which yields J_(M)*, the systemclusters the values of the computed Lagrangians J_(M) as follows. Let Sbe the set of encoding modes k for which the computed Lagrangian valuessatisfy the condition:

$\begin{matrix}{S = \left\{ {k{\frac{J^{*}}{J_{k}} \geq ɛ}} \right\}} & (3)\end{matrix}$

Where epsilon (‘ε’) is a selected error value and J* is the minimum Jfor all modes. If encoding mode 0 is a member of the set S, then thesystem selects encoding mode 0 as the encoding mode that will be used toencode the pixelblock, otherwise the system selects the encoding mode M*that corresponds to J_(M)* (the encoding mode M* that yields thesmallest J value).

The above steps make use of novel components as compared to thereference (non-normative) H.264 encoder. Specifically, the presentinvention makes use of the Huber cost function to compute distortion,modified Lagrangian multipliers, and clustering of the Lagrangianvalues.

The Huber cost function belongs to the class of robust M-estimators. Animportant property of these functions is their ability to reduce theimpact of the outliers. More specifically, if any outliers exist withina pixelblock, the Huber cost function weights them less (linearly) thanthe mean square error function would (quadratically), in turn allowingthe encoding mode selected for that pixelblock to be perhaps identicalto that of the neighboring macroblocks.

The modified Lagrangian multiplier λ varies slower as a function of thequantization parameter Q and therefore places more emphasis on thedistortion component of the Lagrangian J than on the bit rate componentR. (In this document, ‘lambda’ or the lambda symbol ‘λ’ denotes theLagrangian multiplier that is used in the encoding mode decisionprocess. The multiplier that is used in the motion vector selectionprocess is different.)

Finally, the clustering of the Lagrangian values described earlierfavors encoding mode 0. Consequently, the system of the presentinvention allows more pixelblocks to be encoded using direct mode orskip mode for B-pixelblocks and P-pixelblocks, respectively.

Experimental Results

The video test set employed in the experiments consists of nine colorvideo clips from movie sequences “Discovering Egypt”, “Gone with theWind”, and “The English Patient”. The specific characteristics of thesevideo sequences are set forth in Table 1.

TABLE 1 Test sequences. (The abbreviations ch and Og stand for chapterand opposing glances, respectively.) Seq. No. Video sequence name Framesize No. frames Type 1 Discovering Egypt, ch. 1 704 × 464 58 Pan 2 Gonewith the Wind, ch. 11 720 × 480 44 Og 3 Discovering Egypt, ch. 1 704 ×464 630 Pan 4 Discovering Egypt, ch. 2 704 × 464 148 Zoom 5 DiscoveringEgypt, ch. 3 704 × 464 196 Boom 6 Discovering Egypt, ch. 6 704 × 464 298Pan 7 The English Patient, ch. 2 720 × 352 97 Texture 8 The EnglishPatient, ch. 6 720 × 352 196 Og 9 The English Patient, ch. 8 720 × 352151 Og

The video frames are represented in YUV format and the video frame rateis equal to 23.976 frames per second (fps) for all of the videosequences. The effectiveness of the method proposed by the presentinvention has been evaluated using the bit rate R of the compressedvideo sequences and the visual quality of the decoded video sequences.The latter is evaluated by visual inspection of the video sequences andthe peak signal-to-noise ratio (PSNR) values.

The novel components in the encoding method of the present inventiondescribed in the Proposed Direct Mode Enhancement Method sectioncomplement each other in terms of their impact on the rate anddistortion. The method of the present invention yields an overall bitrate reduction as well as a slight peak signal-to-noise reduction. Thesystem of the present invention has been evaluated using two experimentsthat are described in the following section of text.

Fixed Quantization Parameter for All Sequences

The first experiment selects the quantization parameter Q to be the samefor all of the video sequences and to be equal to Q, Q+1, Q+3 for theI-frames, P-frames, and B-frames, respectively. As set forth in Table 2,the bit rate reduction may be as high as 9% when using the encodingmethod of the present invention, where as the loss in peaksignal-to-noise ratio (PSNR) is around 0.12 dB. No distortion is visiblein the video sequences encoded using the encoding method of the presentinvention as compared to those encoded using the reference method.

TABLE 2 Bit rate (BR) [kbits/sec] and peak signal-to- noise ratio (PSNR)[dB] for movie sequences using the reference method and the proposedmethod using the same quantization parameter Q for all of the sequences.Reference Method Proposed Method Seq. Bit rate PSNR Bit rate PSNR No.[kbits/sec] [dB] [kbits/sec] [dB] 1 162.04 38.89 155.43 38.75 (−4.08%)(−0.13 dB) 2 287.71 39.82 283.35 39.71 (−1.51%) (−0.11 dB) 3 659.1437.32 650.92 37.20 (−1.24%) (−0.12 dB) 4 1029.02 35.84 1012.17 35.76(−1.63%) (−0.07 dB) 5 390.46 36.77 354.25 39.59 (−9.27%) (−0.18 dB) 6144.82 39.11 139.02 39.02 (−4.00%) (−0.09 dB) 7 257.06 37.30 255.0837.12 (−0.76%) (−0.18 dB) 8 102.75 40.17 99.81 40.03 (−2.85%) (−0.13 dB)9 222.29 39.62 218.48 39.50 (−1.71%) (−0.12 dB  Max BR Change: −9.27%Min BR Change: −0.76% Avg. BR Change: −3.00% Max PSNR Gain:    0 dB MaxPSNR Loss: −0.183 dB  Avg. PSNR Change: −0.128 dB 

The Highest Quantization Parameter for Each Sequence

To further evaluate the usefulness of the encoding method of the presentinvention, a second experiment was designed and conducted. A generalargument when both the bit rate R and the peak signal-to-noise ratiovalues decrease is that, many methods, such as pre-filtering of videosequences, increasing the values of the quantizer Q, etc., can yieldsimilar results. The goal in this experiment is to show that, when thesesolutions cannot be applied further without impairing the video qualityunacceptably, the method of the present invention can further reduce thebit rate.

First, for each test video sequence, the bit rate is reduced as much aspossible using the reference method by increasing the values of thequantization parameter until Q_(max)+1 when distortion becomes visible.Next, the system encodes and decodes the video sequence using Q_(max)(maximum value for which distortion is not yet visible) and thereference method, yielding the bit rates and peak signal-to-noise ratio(PSNR) values included in Table 3. For each sequence, the value ofQ_(max) is different and it is also different for the I-frames,P-frames, and B-frames, respectively. Given this maximum achievable bitrate reduction at no visual loss, the encoding method of the presentinvention is then applied for encoding the sequences at the same valueQ_(max).

TABLE 3 Bit rate (BR) [kbits/sec] and peak signal-to-noise ratio (PSNR)[dB] for movie sequences using the reference method and the proposedmethod using the highest quantization parameters. Reference MethodProposed Method Seq. Bit rate PSNR Bit rate PSNR No. [kbits/sec] [dB][kbits/sec] [dB] 1 512.59 41.39 479.52 41.15 (−6.45%) (−0.24 dB) 2316.70 40.10 298.86 39.89 (−5.63%) (−0.21 dB) 5 238.78 35.74 210.4035.18 (−11.33%) (−0.56 dB) 6 169.28 39.46 146.75 39.10 (−13.30%) (−0.36dB) 7 300.56 37.78 290.67 37.50 (−3.28%) (−0.28 dB) 9 276.91 40.45270.56 40.31 (−2.30%) (−0.14 dB) Max BR Change: −13.30% Min BR Change:−2.30% Avg. BR Change: −7.04% Max PSNR Gain:    0 dB Max PSNR Loss:−0.56 dB Avg. PSNR Change: −0.29 dB

As set forth in Table 3, the method of the present invention is furtherable to reduce the bit rate significantly by up to 13.3% for peaksignal-to-noise ratio (PSNR) loss around 0.29 dB. By visual inspectionof the sequences at full frame rate (in order to evaluate any B-framerelated artifacts), one can determine that this bit rate reduction doesnot introduce visible artifacts in the decoded video sequences. Notethat, one may increase the value of the quantization parameter aboveQ_(max) when using the method of the present invention and obtain morebit rate reduction without visual loss.

Conclusions

The present invention has proposed a method for direct mode enhancementin B-pictures and skip mode enhancement in P-pictures in the frameworkof the H.264 (MPEG-4/Part 10) video compression standard. The system ofthe present invention makes use of a Huber cost function to computedistortion, modified Lagrangian multipliers, and clustering of theLagrangian values to select the encoding mode that will be used toencode a pixelblock. Tests have shown that significant bit ratereduction is obtained using the method of the present invention at aslight loss in peak signal-to-noise ratio (PSNR) yet with no subjectivevisual quality degradation. These features make the method of thepresent invention particularly useful for bit rate reduction in anyvideo encoding system that employs a rate-distortion optimizationframework for encoding mode decision, as an add-on when other solutionssuch as further increasing the values of the quantization parameter arenot applicable more.

The foregoing has described a method and apparatus for performingdigital image enhancement. It is contemplated that changes andmodifications may be made by one of ordinary skill in the art, to thematerials and arrangements of elements of the present invention withoutdeparting from the scope of the invention.

1-19. (canceled)
 20. A method for selecting an encoding mode from aplurality of encoding modes, the method comprising: for each encodingmode from the plurality of encoding modes; encoding and decoding aparticular array of pixels with the encoding mode; computing errorvalues between pixels in the particular array of pixels andcorresponding pixels in a decoded array of pixels; from the particulararray of pixels, selecting a subset of pixels that have error valuesthat satisfy a threshold; for the particular array of pixels, computinga distortion value based on the error values for the subset of pixels;computing a bit rate value; and computing a Lagrangian value based on(i) said bit rate value, (ii) a Lagrangian multiplier, and (iii) saiddistortion value for the particular array of pixels; and selecting aparticular encoding mode based on the computed Lagrangian values. 21.The method of claim 20, wherein selecting the subset of pixels from theparticular array of pixels comprises using a Huber function.
 22. Themethod of claim 20, wherein the bit rate value is based on a totalnumber of bits required to encode a plurality of motion vectors.
 23. Themethod of claim 20, wherein the bit rate value is based on a totalnumber of bits required to encode a plurality of transform coefficients.24. The method of claim 20 further comprising encoding a plurality ofvideo pictures using the selected particular encoding mode.
 25. Themethod of claim 20, wherein the plurality of encoding modes includes amode 0 encoding, wherein selecting the particular encoding modecomprises: grouping Lagrangian values that are within a threshold valueof a smallest computed Lagrangian value into a cluster; and performingmode selection using said Lagrangian values by (i) selecting the mode 0encoding when a Lagrangian value computed for said mode 0 encoding is insaid cluster and (ii) selecting an encoding mode based on the smallestcomputed Lagrangian value when the Lagrangian value computed for saidmode 0 encoding is not in said cluster.
 26. The method of claim 8,wherein the mode 0 encoding is either a direct mode for encoding abidirectionally predicted array of pixels or a skip mode for encoding aunidirectionally predicted array of pixels.
 27. A method of performingmode selection in a video compression and encoding system, said methodcomprising: encoding and decoding a pixel block with each possibleencoding mode; computing a distortion value for each encoding mode;computing a bit rate value for each encoding mode; computing aLagrangian value for each encoding mode using said distortion value,said bit rate value, and a slow varying Lagrangian multiplier thatvaries slower as a function of a quantization value than a standardLagrangian multiplier in order to emphasize the distortion value overthe bit rate value when computing the Lagrangian value; and selecting anencoding mode by using the computed Lagrangian values.
 28. The method asclaimed in claim 27, wherein said distortion value is computed by usinga function that reduces the effects of outliers on the distortion value.29. The method as claimed in claim 28, wherein the function comprises aHuber function.
 30. The method as claimed in claim 27 wherein computingsaid bit rate value comprises computing a total number of bits that arenecessary to encode a set of motion vectors and a set of transformcoefficients.
 31. A method for selecting an encoding mode from aplurality of encoding modes, the method comprising: for each encodingmode from the plurality of encoding modes: computing a distortion valuefor the encoding mode by using a function that reduces the effects ofoutliers on the distortion value; computing a bit rate value for theencoding mode; and computing a Lagrangian value based on (i) thedistortion value for the encoding mode, (ii) the bit rate value for theencoding mode, and (iii) a slow varying Lagrangian multiplier thatvaries as a function of a quantization parameter at a slower rate than areference Lagrangian multiplier in order to emphasize the distortionvalue over the bit rate value when computing the Lagrangian value; fromthe plurality of encoding modes, selecting an encoding mode based on thecomputed Lagrangian values; and using the selected encoding mode toencode a plurality of video pictures.
 32. The method of claim 31,wherein the reference Lagrangian multiplier is specified by the H.264standard.